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Abstract 

We study the complexity of the following problems in the streaming model. 

Membership testing for DLIN. We show that every language in DLIN can be 
recognised by a randomized one-pass 0(log n) space algorithm with inverse poly- 
nomial one-sided error, and by a deterministic p-pass 0(n/p) space algorithm. 
We show that these algorithms are optimal. 

Membership testing for LL(fc). For languages generated by LL(fc) grammars with 
a bound of r on the number of nonterminals at any stage in the left-most 
derivation, we show that membership can be tested by a randomized one-pass 
0(r log n) space algorithm with inverse polynomial (in n) one-sided error. 

Membership testing for DCFL. We show that randomized algorithms as efficient 
as the ones described above for DLIN and LL(/c) (which are subclasses of DCFL) 
cannot exist for all of DCFL: there is a language in VPL (a subclass of DCFL) for 
which any randomized p-pass algorithm with error bounded by e < 1/2 must 
use £l(n/p) space. 

Degree sequence problem. We study the problem of determining, given a se- 
quence di, d-2, . . . , d n and a graph G, whether the degree sequence of G is pre- 
cisely di, d%, . . . , d n . We give a randomized one-pass O(logn) space algorithm 
with inverse polynomial one-sided error probability. We show that our algo- 
rithms are optimal. 

Our randomized algorithms are based on the recent work of Magniez et al. 
[1]; our lower bounds are obtained by considering related communication com- 
plexity problems. 
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1. Introduction 

Modeling computational problems as language recognition is 
well-established in theoretical computer science. By studying the complexity 
of recognising languages, one seeks to understand the power and limitations 
of various computational models, and also classify problems according to their 
hardness. In this paper, we study language recognition problems in the data 
stream model. 

The data stream model was invented to understand issues that arise in com- 
putations involving large amounts of data, when the processors have limited 
memory and are allowed limited access to the input (typically, restricted to a 
small number of passes over it). Such a situation arises when the input is in 
secondary storage and it is infeasible to load it all in the main memory. In re- 
cent years, this model has gained popularity for modeling the actions of routers 
and other agents on the internet that need to keep aggregate information about 
the packets that they handle; the number of packets is large, and the routers 
themselves are allowed only a small amount of memory. In this case, the final 
decision needs to be based on just one pass over the input. 

In the data stream model, the two main parameters of interest are the mem- 
ory available for processing and the number of passes allowed. An algorithm 
is considered efficient if the space it uses is significantly smaller than the input 
length (ideally, only polylogarithmic) , and the number passes on the input is 
small (ideally, just one). Given these constraints, most interesting problems be- 
come intractable in this model if the algorithm is required to be deterministic. 
Randomness, however, is remarkably effective, and many interesting random- 
ized algorithms have been proposed (starting with Alon et al. [2j and see the 
survey by Muthukrishnan |3[). 

When the number of passes over the input is not restricted, or when random 
access to the input is available, the data stream model corresponds closely to 
the model of space bounded Turing machines. Often, techniques developed for 
such unrestricted space bounded computations, carry over to the data stream 
model with limited access to inputs (e.g. Nisan's pseudorandom generator [4] 
designed for derandomizing space bounded randomized computations, has been 
effectively employed in many data stream algorithms, starting with Indyk Q). 
In this paper, we consider streaming algorithms for several language recognition 
problems that can be solved in polylog(n) space on a Turing machine. 

We will assume that the reader is familiar with basic formal language theory, 
in particular, the class of context free languages (CFL). Our results concern some 
subclasses of CFLs, namely DLIN, LL(fc) and DCFL (we recall their definitions in 
Sections [2 [3] and |4|) . Slightly differing definitions for DLIN were first given by 
Ibarra et al. @ and Nasu et al. 0; the definition we use is due to Higuera et 
al. [8], where the several similar definitions are compared and a more general 
class is defined. It was shown by Holzer et al. [t| that membership in these 
languages can be tested in space O(logn). LL(fc) languages were defined by 
Lewis et al. and Knuth .llj, an d they play an important role in parsing 
theory. Informally, they are the languages for which the left-most derivation 
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can be obtained deterministically by making a single pass on the input from left 
to right with fc-lookaheads. Apart from some technicalities arising from e-rules 
in the grammar, the class LL(fc) includes DLIN. It was shown by [l2[ that all 
deterministic context-free languages can be recognised in space 0(log 2 n). In 
this paper, we examine if languages in DLIN and LL(fc) admit similar efficient 
membership testing in the streaming model. 

Our work is motivated by a recent membership testing algorithm of Magniez 
et al. [l| for the language Dycl<2, which is the language of balanced parenthe- 
ses on two types of parentheses. The algorithm uses 0(y / nlogn) space. We 
apply their fingerprinting based method to the subclass DLIN and also give a 
deterministic p-pass, 0(njp) space algorithm. 

Theorem 1. For every L G DLIN. 

1. there is a randomized one-pass Oilogn) space streaming algorithm such 
that for x G {0, 1}" 

(a) if x G L then the algorithm accepts with probability 1; 

(b) if x L then the algorithm rejects with probability at least 1 — jr • 

2. there is a deterministic one-pass 0{n/p) space streaming algorithm for 
testing membership in L. 

(Note that our result does not generalize the result of |l| for Dycl<2, because 
Dycl<2 does not belong to DLIN.) However, Theorem [T] cannot be improved. 

Theorem 2. Let 

l-iurn-Dyck 2 = {ww R : w G {(, [}"', n > 1}, 

where w is the string obtained from w by replacing each opening parenthesis by 
its corresponding closing parenthesis; w R is the reverse of w. 

1. Any p-pass randomized streaming algorithm that determines membership 
in l-turn-DycU.2 with probability of error bounded by e < \, must use 
Q((logn)/p) space. 

2. Any p-pass deterministic streaming algorithm that determines membership 
in l-turn-DycU.2 must use VL(n/p) space. 

This result is obtained by deriving, from the streaming algorithm, a two- 
party communication protocol for determining if two strings are equal, and 
then appealing to known lower bounds for the communication problem. 

We next investigate if efficient membership testing is possible for languages 
in classes larger than DLIN. Similar, fingerprinting based algorithms apply to 
the class LL(fc), but their efficiency depends on a certain parameter based the 
underlying grammar G. In order to state our result precisely, we now define 
this parameter. 

Let L be a language generated by an LL(fc) grammar G. For a string w G L, 
let rank,G(w) denote the maximum number of nonterminals in any sentential 
form arising in the (unique) leftmost derivation generating w. Let the rank of the 
grammar, ranka '■ N — > N, be defined as rankc(n) = max u)e { ,i}"nL(G) ranko(w) 
We will assume that rankc(n) is a well-behaved function, say it is log-space 
computable. 
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Theorem 3. Let G be an LL(fc) grammar. There is a randomized one-pass 
streaming algorithm that given an input w € {0, 1}™ and a positive integer b, 
using space Oiblogn) (the dependence on k varies with the grammar), 

1. accepts with probability 1 if w £ L(G) and rankc{w) < b; 

2. rejects with probability at least 1 — — if w ^ L(G) or rankoiw) > b. 

Corollary 4. Let L be a language generated by an LL(fc) grammar G. There 
is a randomized one-pass streaming algorithm that given an input w £ {0, 1}", 
using space 0(rankc(n) log n), 

1. accepts with probability 1 if x G L; 

2. rejects with probability at least 1 — — if x ^ L. 

Note that the above result does not give efficient streaming algorithms un- 
conditionally, for the space required depends on rankc(x), which in general 
may grow as Sl(n). Note, however, that results based on such properties of the 
derivation have been considered in the literature before. In fact, the class of left 
derivation bounded languages defined by Walljasper consists precisely of 
languages for which rank^n) is a constant independent of n; this class was also 



shown to be closed under AFL operations in [13[. Many well-studied classes of 
languages are subclasses of left derivation bounded languages. The nonterminal 
bounded languages generated by nonterminal bounded grammars (which have 
a bounded number of nonterminals in the sentential forms in any derivation) 
were studied by Workman [l4| , who also proved that they contain all ultralinear 
languages, which are lang uag es accepted by finite turn pushdown automata (de- 
fined by Ginsburg et al. |l5()- However, nonterminal bounded grammars need 
not be LL(fc). 

Despite the dependence on rank(j(n), the above corollary is applicable to 
classes of languages such as rest-VPL defined in [l6| (this considered the restric- 
tion of VPLs which have LL(1) grammars) and DLINs restricted to grammars 
without derivations of the form A — ¥ e. For these classes ranka(n) is bounded 
by a constant independent of n. 

We now turn to show that classes provably do not admit solutions in the 
streaming model with polylogarithmic space. Lower bounds for membership 
testing of context-free languages in the streaming model were studied by Mag- 
niez et al. They proved that any one-pass randomized algorithm requires 
J7(v / « log n) space for testing membership in Dycl<2. More recently, Jain et 
al. [ItJ proved that if the passes on the input are made only from left to right, 
then in spite of making p passes on the input, the membership testing for Dyck2 
requires Q,(y/n/p) space. Here we prove that in general for languages in DCFL, 
no savings in space over the trivial algorithm of simulating the PDA can be 
expected. 

Theorem 5. There exists a language L £ VPL C DCFL such that any random- 
ized p-pass streaming algorithm requires Q(n/p) space for testing membership in 
L with probability of error at most e < \. 
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The language L in the above result is a slight modification of Dyck2. This 
result is proved by reducing the membership problem in the streaming model 
to the two-party communication problem of checking whether two subsets of an 
an n-element universe are disjoint. 

The upper bounds above show that the method of fingerprinting can be 
fruitfully applied to many problems to check equality of elements located far 
away in the input string. We provide one more illustration of the amazing 
power of this technique. 

Degree- Sequence, Deg-Seq:. The degree sequence problem is the following. 

Input: A positive integer n and sequence of directed edges 

(ui,v 1 ),(u 2 ,v 2 ), . ■ . ,(u m ,v m ) where u u Vi € {1,2, ...,n} 

on vertex set {1, 2, . . . , n}. 

Task: Determine if vertices 1, 2, . . . ,n have out-degrees d\, d%, . . . , d n , respec- 
tively? 

This problem is known to be in log-space (in fact in TC° (see for example 
18 1)). It has been observed [l9|, [2(| that the complexity of graph problems 



changes drastically depending on the order in which the input is presented to 
the streaming algorithm. If the input to Deg-Seq is such that the degree of a 
vertex along with all the edges out of that vertex are listed one after the other, 
then checking whether the graph has the given degree sequence is trivial. If the 
degrees sequence is listed first, followed by the adjacency list of the graph then 
we observe that a one-pass deterministic algorithm needs O(n) space to compute 
Deg-Seq. For a more general ordering of the input where the degree sequence is 
followed by a list of edges in an arbitrary order, we prove the following theorem: 

Theorem 6. // the input is a degree sequence followed by a list of edges in an 
arbitrary order, then Deg-Seq can be solved 

1. by a one-pass, O(logn) space randomized streaming algorithm such that 
if vertices 1,2, ... , n have out-degrees d\, da, • • • , d n , respectively, then the 
algorithm accepts with probability 1 and rejects with probability 1 — — , 
otherwise. 

2. by a p-passes, O ((n log n)/p)- space deterministic streaming algorithm. 
We also show that the above result is optimal up to a logn factor. 

Theorem 7. 

1. Any p-pass randomized streaming algorithm for Deg-Seq with probability 
of error bounded by e < |> must use £l((logn)/p) space. 

2. Any p-pass deterministic streaming algorithm for Deg-Seq must use fl(n/p) 
space. 
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2. Membership testing of DLIN 



In this section, we study the complexity of membership testing for a subclass 
of context free languages called DLIN, in the streaming model. Informally it is 
the class of languages accepted by 1-turn PDA(i.e. PDA which do not make a 
push move after having made a pop move) , with restrictions similar to LL(1). 

We start with some definitions. See [21| for the basic definitions regarding 
context-free grammars(CFG) and pushdown automata(PDA). 

Definition 1 (Higuera et al.Q). Deterministic linear CFG or DL-CFG, is 

a CFG (E, TV, P, S) for which, every production is of the form A — > aco or A — > e, 
where a G E and u> G (N U {e})E* and for any two productions, A — > au and 
B bio', if A = B then a^b, where a, b G E and uj,uj' G {N U {e})E*. 

Definition 2. Deterministic linear CFL, DLIN, is the class of languages for 
which there exists a DL-CFG generating it. 

DLIN is a well studied class in language theory. Higuera et al. Q gives 
algorithms for learning such grammars. Many variations of the above definition 
have been considered in earlier works. The above definition is more general 
than the ones given in [(| 0] as was proved in Q . Note that the set of languages 
accepted by deterministic 1-turn PDA is a strict super-set of DLIN. For example 
L = {a n b n or a n c n \ n > 0} ^ DLIN but is accepted by a deterministic 1-turn 
PDA. 

Definition 3. Canonical Pushdown Automaton or CPDA for a language 
L generated by a CFG G = (E,iV,P,S) is a PDA M L = (Q = {<7},S,T = 
N U E, 5, qo = q, S) , where the transition function 5 is defined as follows: 

1. for each production of the form A — >■ aw where uj G (N U E)*, S(q, a, A) = 

2. for every production A — > lu that is not considered above, 5(q, e, A) = 

3. for all a G E, S(q, a, a) = (q, e). 

Ml starts with only the start symbol S on the stack and it accepts by empty 
stack. The language accepted by Ml is L. 

If the rules of the form A — > e are removed from a DL-CFG, then the corre- 
sponding CPDA is deterministic. However if the length of the string is known 
before hand, then we can infer when such a rule is to be applied. It is pre- 
cisely when the sum of the length of the string seen so far and the number 
of nonterminals in the stack add up to the total length. So the CPDA can be 
simulated deterministically by making only a single pass over the input, but the 
stack can take up Q(n) space. The algorithm for membership testing of DLIN is 



1 S(q, a, A) = (q, u>) implies that when the PDA is at state q, has a as the next input symbol 
and A on top of stack, will remain in state q, replacing A by a; at the top of the stack. 
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obtained by simulating the CPDA with a compressed stack. The stack is com- 
pressed by using a hash function which is a random evaluation of a polynomial 
constructed from the stack. This method commonly known as fingerprinting 
(see [22|, Chapter 7) was used by Magniez et al. 1], for giving a streaming 
algorithm for membership testing of Dycl<2. Here we apply the technique to the 
class of DLIN. Note that Dyck2 is not contained in DLIN. 

2.1. Compressing the stack 

First we make an observation about the stack of a CPDA for a language L, 
generated by a DL-CFG. 

Observation 4. For any string w G L and at any step i G [\w\], the stack of 
the CPDA contains at most one nonterminal. 

Consider the run of the CPDA on w G S* in which any transition of the 
form S(q, e, A) = (q, e) is applied only at the step i when the sum of i — 1 and 
the number of terminals in the stack adds up to \w\. For w G £*, i G [| w|] , 
let Stack(w, i) G E* be the sequence of terminals in the stack of the CPDA, 
when it encounters the zth symbol of input w. We consider it from bottom to 
the first nonterminal or the top if there is no nonterminal. If the CPDA rejects 
before reaching i, then Stack(w, i) is not defined. Similarly, let NonTerm(w, i) 
be the unique nonterminal on top of the stack of the CPDA, when it has reached 
position i on input w. If there is no nonterminal in the stack, then it is e. 

We will assume a fixed bijective map from £ = {ai,a%, . . . ,a m } to [m] — 
{1, 2, . . . , m}. Furthermore we will use ai to denote the value of the map on a;. 
For any string v G £*, a prime p, formal variable x, let 



can be considered as an encoding of Stack(w,i). 

Observation 5. CompStack(w , n, x) has degree at most n and is the zero poly- 
nomial if and only if w G L. 

It is therefore sufficient to check whether CompStack(w, n, x) is the zero 
polynomial for testing membership in L. To explicitly store this polynomial, 
Q(n) space may be required. But a random evaluation of a non-zero degree d 
polynomial over ¥ p is zero with probability at most d/p (due to Schwartz Zippel 
Lemma). Hence it suffices to keep a random evaluation of CompStack(u;, i, x), 
which can be stored using just [logp] bits, for checking if it is zero. If p — 0(n) 
then the space needed is considerably reduced to O(logn). 



hi 




mod p 



be a polynomial over F p . Then 



CompStack(u>, i, x) — FP(Stack(u>, i), 0, x,p) 
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Algorithm 1 Randomized one pass algorithm 
1: Input : w G S*. Let \w\ = n. 
2: Pick a uniformly at random from ¥ p . 
3: comp_stack <— 0; non_term <— S; h <— 
4: for i = 1 to n do 
5: if non_term ^ e then 
6: if h + i— 1 = n then 

7: if a rule of the form non_term — > e docs not exist then reject 

8: else non_term <— e 

9: else 

10: Find the unique rule of the form below. Otherwise reject 

non.term -> tu[i].Bt;, u £ E*,B e WU {e} 



11: comp_stack comp_stack + FP(v R , h, a,p) mod p 

{where v R is v reversed} 
12: non_term <— B ; /i 4— /i + |w| 

13: end if 
14: else 

15: comp_stack <— comp_stack — w[i]a' l_1 mod p 
16: h «- /i - 1 
17: end if 
18: end for 

19: if comp_stack = and h = then accept 
20: else reject 
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2.2. Algorithm 

The algorithm (Algorithm [1]) is obtained by observing that the CP DA can be 
simulated using a compressed stack. 

Algorithm Q] uses [logp] bits to store a, [logp] for comp_stack, 2[logn] for 
i, h and some constant space that depends on the grammar for non_term. Hence 
the space complexity is 2|Togp] + 2|~logn] + c. It also uses [logp] random bits. 

2. 3. Proof of Correctness 

Lemma 8. // the input is not rejected on or before the i th iteration of for loop 
on line 4 of algorithm Q] then 

• h = \Stack(w, i)\ 

• comp_stack = CompStack{w,i,a) 

• non_term = NonTerm(w,i). 

Proof. The lemma is proved using induction on i. At i = 1, h = 0, comp_stack = 
CompStack(u;, I, a) =0 and non_term = NonTerm(w, 1) = S. Assuming above 
is true for the ith iteration of the loop. After the updates in line 11 (or 15), 
we have that comp_stack = CompStack(w, i, a) + X)|/=i v R [j]a /l+: ' _1 mod p — 
CompStack(iy, i+1, a) (or comp_stack = CompStack(w, i, a)— mod p = 
CompStack(w, a), respectively). Similarly h and non_term are updated cor- 
rectly in lines 8, 12 and 16. □ 

Applying Lemma [5] for i = n, we get the following corollary. 

Corollary 9. If w G L then Algorithm]]^ accepts with probability 1. 

Lemma 10. If w ^ L then Pr[Algorithm[l\ accepts] < n/p. 

Proof. If w L then CP DA rejects, say at step j. There are three cases. 

1. NonTerm(w, j) was dehned and it rejected as a matching rule of the form 
NonTerm(w, j) ->• w\j]u, uj e £*(7VU {e})£* could not be found. 

2. NonTerm(u;, j) was not defined and it rejected as the last character of 
Stack(u>, j) was not w[j]. 

3. the stack was not empty at the end of the string. 

In the case 1, Algorithm [T] rejects with probability 1. For case 2, the mono- 
mial subtracted by the algorithm is w[j]a' l ~ 1 . The only other monomial in the 
sum with the degree /i — 1 is aa' 1-1 where a is the last character of Stack^j). 
Also after the jth step no monomial of degree h is subtracted. So the polynomial 
for which comp_stack is an evaluation is not the zero polynomial. This is also 
true in case 3, as the stack is not empty. The lemma follows, by an application 
of the Schwartz Zippel Lemma. □ 

Theorem Q] is obtained by finding a prime p between n 2 and 2n 2 by brute 
force search and then using Algorithm [TJ 
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2.4- A deterministic multi-pass algorithm 

In this section we give a deterministic multi-pass algorithm for the mem- 
bership testing of any language in DLIN. This is done by first reducing the 
membership testing problem for any L £ DLIN to membership testing of a par- 
ticular language Dyck^ £ DLIN. Recall that Dyck^ is the language generated by 
the grammar 

S->SS\ (i5)i | { 2 S) 2 | ••• | ( k S) k \e 
and l-turn-Dycl<2 is generated by 

S (S) | [S] I 6. 

We will be using the following definition of streaming reduction: 

Definition 6 (Streaming Reduction). Fix two alphabets £i and £2. A problem 
Pi is f(n)- streaming reducible to a problem P 2 in space s(n) if for every input 
x £ E™, there exists y\y 2 ■ ■ ■ Un with 

such that: 

• yi can be computed from Xi using space s(n). 

• From a solution of P 2 on input y, a solution on P\ on input x can be 
computed in space s(n). 

Note that our definition is a slight modification of the definition from [10. 
In it was observed that the membership testing of Dyck^ 0(log fc)-streaming 
reduces in O(logfc) space to membership testing of Dyck2. We show that the 
membership testing for any language in DLIN 0(l)-streaming reduces in 0(log n) 
space to membership testing in 1-turn-Dyckfc, where k is the alphabet size of 
the language. It is easy to see that in the the reduction of Magniez et al. [l[, 
the output of the reduction is in l-turn-Dyck2 if and only if the input is in 
1-turn-Dyck/j. Hence we have the following theorem: 

Theorem 11. The membership testing for any language in DLIN 0(log|E|)- 
streaming reduces in O(logn) space to membership testing in l-tern-Dyck2 ; 
where £ is the alphabet of the language. 

Say L is a fixed DLIN, with S = {a\,a 2 , . . . ,0^}. Given an input w, the 
streaming reduction outputs a string w' £ £ U E so that w' is in 1-turn-Dyckfc 
if and only if w belongs to L. Here E = {aT, 02, • ■ • , Qfc} and for each i £ [k] 
(ai,Hi) is a matching pair. The streaming reduction is obtained by making a 
change to the steps 11 and 15 of Algorithm Q] and is given as Algorithm El 



2 In 0, \ji s are assumed to be of fixed length, i.e. from Ej 



10 



Algorithm 2 Streaming reduction from L e DLIN to Dyck/j 
1: Input : w G S*. Let \w\ = n. 
2: Output : m'eSUS 
3: non_term <— S; w' <— e 

4: i <- 1 

5: while i < n do 

6: if non_term 7^ e then 

7: if |tu'| + i — 1 = n then 

8: if a rule of the form non_term — > e does not exist then reject 

9: else non_term <— e 

10: else 

11: Find the unique rule of the form below. Otherwise reject 



non_term ->■ w[i]Bv, ti £ E*,B G iVU {e} 



14: 



12: 



13: 



W* <— w' • v R 

non_term <— £>; i •<— i + 1 
end if 



18: 



15 



16: 



17: 



else 

w' — w' ■ w[i] ; i <s — i + 1 
end if 
end while 
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From Theoremllli we know that any language in DLIN 0(log |E|)-strearning 
reduces to l-turn-Dyck 2 . Thus it suffices to give a p-passes, (3(n/p)-space de- 
terministic algorithm for membership testing of l-turn-Dycl<2. 

The algorithm divides the string into blocks of length n/2p. Let the blocks be 
called B , B\, . .., B 2p ~i from left to right, (i.e. B t — w[i(n/2p) + l] w[i(n/2p) + 
2] . . . w[(i + l)n/2p].) The algorithm considers a pair of blocks (Bj,B 2p _(j + i)) 
during the jth pass. Using the stack explicitly, the algorithm checks whether 
the string formed by the concatenation of Bj and j3 2p -(j+i) is balanced. If it is 
balanced, it proceeds to the next pair of blocks. The number of passes required 
is p. Each pass uses 0(n/p) space and the algorithm is deterministic. Later in 
Section 0] we show that this algorithm is optimal. 
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3. Membership Testing of LL(fc) languages 

In this section we give a randomized streaming algorithm for testing mem- 
bership in LL(fc) languages. Let G — (N,T,,P,S) be a fixed grammar. For a 
string w G £*, let 



The select set of a production A — > a, where A G N and a G (iV U S)* is 

SELECT(j4 — > a) = {u | 3i>, we £*, aw derives w and pref fc (u>) = ti}. 

Definition 7 (Lewis et al. [10|). -A grammar G — (N, E, P, 5) is LL(fc) «/ /or ant/ 
iwo distinct productions of the form A — > a, A — > /3, i/ie select sets are disjoint. 
LL(fc) languages are the class of languages generated by LL(fc) grammars. 

From now on, we describe an algorithm for LL(1) languages. It is easy to 
observe that it generalises for LL(fc) languages. Let L be a language generated 
by an LL(1) grammar G. It is known that for any two distinct rules R ^ R' in 
the production set of G with the same left side, SELECT(P) and SELECT(P') are 
disjoint. We call this the LL(1) property. Note that DL-CFGs with no epsilon 
rules have this property. Therefore, languages generated by DL-CFGs with no 
epsilon rules, are a subclass of LL(1). As noted by Kurki-Suonio they are 
in fact a proper subclass of languages generated by LL(1) grammars with no 
epsilon rules. As the part of the preprocessing, for every rule R of the grammar 
we compute the set SELECT(i?). This requires only O(l) space as the grammar 
is fixed. 

Our membership testing algorithm for DLIN uses the LL(1) property non 
trivially. Algorithm [T] can be thought of as working in two main steps. The 
first step involves reading a terminal from the input and deciding the next rule 
to be applied. The second step consists of updating the stack appropriately. 
The LL(1) property enables the CPDA to deterministically decide the next rule 
to be applied having seen the next input terminal. Therefore, the first step will 
remain unchanged even in the case of membership testing of LL(1) languages. 
In what follows we describe the second step. 

Let r^7fc . . .r 7o, 7i G E* and Lj £ N be any sentential form arising in 
the derivation of w G L. Then the corresponding CPDA will store this in the 
stack(in the above order from top to bottom). It is easy to see that the CPDA 
is generating the left most derivation of w. The space efficient algorithm that 
we give below compresses the strings 7^s as before and stores Ti, compression 
of 7, and |7i| as a tuple on the stack. For a string w G L, the algorithm runs in 
space 0(rank(w) (log p + logrt)), where p is the size of the field over which the 
polynomial is evaluated. 

3.1. Streaming algorithm for testing membership in LL(1) languages 

Given below is the randomized streaming algorithm for testing membership 
in LL(1) languages. 




if I id I > k then the first k characters of w 
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Algorithm 3 Randomized one pass algorithm 

l: Input : w G S*. Let \w\ = n. 

2: Pick a uniformly at random from F p . 

3: comp_part 4- ; non_term 4— S ; h 4- 

4: comp_stack.push(comp_part, non_term, ft) 

5: i 4- 1 

6: while i < n and comp_stack not empty do 

7: (comp_part, non_term, ft) 4- comp_stack.pop() 

8: if non_term 7^ e then 

9: Find the unique rule R of the form below such that w[i] £ SELECT(i?). 
Otherwise reject. 

non.term — > B t ^ t B t ^t-\ ■ ■ ■ B /3 where all /?; e S*, and e Mj{e} 

10: comp_stack.push(comp_part + FP((3q, ft, a,p), B , h + \f3 \) 

11: for k 4- 1 to t do 

12: comp_stack.push(FP(/3f ,0,a,p),B k , \Pk\) \\ fif = reverse^) 

13: end for 

14: else 

15: if ft ^ then 

16: comp_part 4— comp_part — w[i]a' l_1 mod p ; h 4— h — 1 

17: comp_stack.push(comp_part, non_term, ft) 

18: else if comp_part 7^ then 

19: reject 

20: end if 

21: i 4- 1 + 1 

22: end if 

23: end while 

24: if comp_stack is not empty then reject else accept 
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The algorithm uses [logp] space to store a, [logp] + O(l) + logn space to 
store a tuple on the stack. On input w, the space used by the algorithm is 
at most rankc (w) (log n + \logp] +0(1)). Therefore, for a language generated 
by grammar G, the space used by the algorithm for checking w € L is at most 
0(ranka(n) (log n+logp)). For proving Theorem^ Algorithm[3]can be modihed 
to to take an additional parameter b as an input and reject when w L or the 
number of items in the stack exceeds b. Also p can be set to a prime between 
n 2 and 2n 2 which can be found by brute force search, so that error probability 
n/p < 1/n. 

3.2. Correctness of the algorithm 

In this section we prove the correctness of the algorithm. Note that, given 
an LL(1) grammar, the simulating CPDA performs a top-down parsing of the 
grammar. On reading a symbol from the input, and the top of the stack, it 
deterministically picks a rule to be applied next. If no such rule exists, it halts 
and rejects. If such a rule is found, it pushes the right hand side into the stack. 
As long as the stack-top is a nonterminal it repeats this process. If the stack-top 
is a terminal, it pops the top terminal from the stack, provided it matches with 
the next input letter. Suppose there is a mismatch, it halts and rejects. If after 
processing the whole string the stack is empty, it accepts. 

We now prove that the working of the algorithm has a close correspondence 
with the working of the CPDA. 

Lemma 12. Let Stack(t) = Tk'Jk ■ ■ ■ ToToi I\ G A, 7$ G S* be the contents 
of the stack of CPDA before the t th step(counted in terms of application of the 
transition function) and 

stack(t) = [{compjpartj, non^terntj, hj), . . . , (comp_part Q , non_term^, ho)] 

be the contents of the stack of Algorithm^ before the t th iteration of the while 
loop in line 6. If the CPDA has not rejected on or before step t then j = k and 
.'<••: {(I.---/.-!. 

• comp_part i = FP(~fi,Q, a,p) 

• non_termi = Tj 

• hi = |7i| 

Proof. The lemma can be proved by induction on t. At t = 1, Stack(l) = 
S, stack(l) = [(0, S, 0)](due to the initialisation steps 3,4) and the lemma is 
true. Suppose it is true at step t, we will prove that the lemma holds at t + 1st 
step. We consider various cases. Assume that I 1 *; ^ e in Stack(i). Therefore, 
by inductive hypothesis the stack-top maintained by the algorithm has Tfc in its 
second component. Then steps 9, 10, 12, 16, makes sure that updates are made 



3 FP was defined in Section l2.ll 
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correctly. Suppose Tk = e and jf. = av, a £ E,u € E* then by inductive hy- 
pothesis, the top most item of stack(i) is (FP(7fc, 0, a,p), e, \jk\)- By definition 
FP(7fc,0,a,p) = aa' 7 '' -1 + FP(u, 0, ct,p). If \v\ > then after the execution of 
step 17, this will become (FP(v,0,a,p),e,\v\) which is same as the top most 
item of Stack(t + 1). On the other hand if v — e, this item not push back in to 
the stack. □ 

Lemma 13. If w £ L then the algorithm accepts with probability 1. If w ^ L 
then the probability that the algorithm accepts is bounded by n/p. 

Proof. If w £ L then by Lemma [12] we have that the algorithm always accepts. 
Suppose the CPDA rejects at a certain step t, when symbol at the t'th position 
of the input was accessed. Let the There are three cases: 

1. CPDA had a non-terminal on the stack-top and it rejected as a matching 
rule to be applied could not be found. 

2. CPDA had a terminal on the stack-top, say a and it rejected because a 
w[t'}. 

3. the stack was not empty at the end of the string. 

In Case 1, the algorithm rejects with probability 1. For Case 2, let the top 
most item in the stack of the algorithm at step t be (comp_part, non_term, h). 
Then the algorithm subtracts w[j]a' l_1 from the stack and decreases the height 
by 1. The only other monomial in comp_part with degree h — 1 is aa' 1-1 . 
Hence comp_part is a random evaluation of a nonzero polynomial of degree 
at most n. From Lemma 1121 nonjterm = e and hence no other monomial of 
degree h is added or subtracted from comp_part. Now either the stack item 
(comp_part, non_term, h) is never popped, or at the time of popping comp_part 
is checked to be zero. In the former case, the algorithm rejects with probability 
1 and in the latter with probability at least 1 — n/p. In Case 3 the algorithm 
rejects with probability 1. □ 

Now Theorem [3] follows from the above lemma by appropriately selecting 
the value of p to be a prime between n c+1 and 2n c+1 . Such a prime can be 
obtained in time polynomial in n by exhaustive search. 
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4. Lower bounds for membership testing 



In this section, we prove that the algorithms given in Section [5] are optimal. 

Proof of Theorem [H We reduce the two-party communication problem of test- 
ing equality (Vx,y £ {0,1}™, EQUALITY(a;, y) = 1 «->• x = y) of strings to 
membership testing for l-turn-Dycl<2. In this communication problem, the first 
party, Alice, is given a string x and the other party, Bob, is given the string y, 
and they need to communicate to determine if x and y are equal. 

Suppose there is ap-pass streaming algorithm for l-turn-Dyck2 using space s. 
We will show that such an algorithm leads to protocol for the communication 
problem, where the total communication is (2p — l)s. First, Alice and Bob 
transform their inputs as follows. Let x' be the string obtained from x by 
replacing every by a [ and every 1 by a (; let y' be the string obtained from 
y by first reversing it and then replacing 0, 1 by ], ) respectively. Note that the 
string z — x'y' £ {(,[,],)} ™ £ l-turn-Dyck 2 iff x = y. Alice and Bob will 
simulate the streaming algorithm on z in the following natural way: Alice runs 
the streaming algorithm on x 1 and on reaching the end of the her input, passes 
on the contents of the memory to Bob who continues the simulation on y' and 
passes the contents of the memory back to Alice at the end. If there algorithm 
makes p (left to right) passes, then during the simulation the contents of the 
memory change hands 2p— 1 times. If the algorithm is deterministic, the protocol 
is deterministic. If the algorithm is randomized, the protocol is randomized and 
has the same error probability. 

Since any deterministic protocol for EQUALITY(x, y) requires n bits of com- 
munication and any randomized protocol requires fi(logn) of communication 



for (error bounded by a constant strictly less than i) (see for example 24j) 



both our claims follow immediately. □ 

We now establish our lower bound for DCFLs. 

Proof of Theorem^ Consider the language L generated by the CFG with rules 

S^[S] | {S) I (S] I e. 

Note that L is in DCFL; in fact, it is a VPL. 

It is easy to verify that two strings x, y £ {0, 1}™ represent characteristic 
vectors of disjoint subsets of {1,2, ... ,n} iff the string x'y' £ L, where x' is 
obtained from x and y' from y exactly as in the proof of Theorem O Thus, a 
p-pass space s streaming algorithm for membership testing in L can be used to 
derive a protocol for the set disjointness problem using communication (2p — 
l)s. Since the bounded error randomized communication complexity of the set 
disjointness problem is fi(n) (see [13])) our claim follows immediately. □ 
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5. Streaming algorithms for checking degree sequence of graphs 

In this section, we study the complexity of solving the problem Deg-Seq 
defined in Section [TJ We present the proof of the first part of Theorem [5] 

Proof of part 1 of Theorem OJ We come up with a uni-variate polynomial from 
the given degree sequence and the set of edges such that the polynomial is 
identically zero if and only if the graph has the given degree sequence. 

We do not store the polynomial explicitly. Instead, we evaluate this poly- 
nomial at a random point chosen from a large enough field and o nly maintain 
the evaluation of the polynomial. The Schwartz-Zippel lemma [22[ gives us 
that with high probability the evaluation will be non-zero if the polynomial is 
non-zero. (If the polynomial is identically zero, its evaluation will also be zero.) 

Let the vertex set of the graph be {1, . . . ,n}. The uni-variate polynomial 
that we construct is: 

m 

i i—1 

The algorithm can be now described as: 



Algorithm 4 Randomized streaming algorithm for Deg-Seq 
Pick a €_r F p (p will be fixed later). 
Sum <— 
for i = 1 to n do 

Sum <— Sum + alia 1 
end for 

for i = 1 to m (where m number of edges) do 

Sum <— Sum — a Ui 
end for 

if Sum = then 

accept 
else 

reject 
end if 



It is easy to note that the algorithm requires only log-space as long as p 
is 0(poly(n)). The input is being read only once from left to right. For the 
correctness, note that if the given degree sequence corresponds to that of the 
given graph, then q(x) is identically zero and the value of Sum is also zero 
for any randomly picked a. We know that q(x) is non-zero when the given 
degree sequence does not correspond to that of the given graph. However, the 
evaluation may still be zero. Note that degree of q(x) is n. If the field size 
is chosen to be n 1+c < p < n 2+c then due to Schwartz-Zippel lemma [22| the 
probability that Sum is zero given that q{x) is non-zero is at most n/p which 
is at most n~ c . □ 
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Now we give a p-pass, 0((n log n) /p)-space deterministic algorithm for Deg-Seq 
and hence prove part 2 of Theorem [6j The algorithm simply stores the degrees 
of n/p vertices during a pass and checks whether those vertices have exactly the 
degree sequence as stored. If the degree sequence is correct, then proceed to 
the next set of n/p vertices. The algorithm needs to store 0((n log n)/p) bits 
during any pass. The algorithm makes p-passes. 

Finally we show that both the algorithms presented for Deg-Seq is optimal 
up to a logn factor, by proving Theorem [7J 

Proof of Theorem [7j We reduce the two party communication problem of test- 
ing equality to that of Deg-Seq. Given strings x, y € {0, 1}" we obtain a degree 
sequence d = (di, d?, ■ ■ ■ , d n ) and a list of edges ei€2 ■ ■ ■ e m . Take ck = Xi and 
for each i such that y\ = 1, add an edge (i, i). Clearly EQUALITY(a;, y) = 1 if and 
only if d is the degree sequence of the graph with edges eie2 • • • e m . Again, as in 
proof of Theorem [2l the theorem follows because of the known communication 
complexity lower bounds for EQUALITY. □ 
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